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ABSTRACT 

Identification  of  ice  floes  and  their  outlines  in  satellite  images  is  impor¬ 
tant  for  understanding  physical  processes  in  the  polar  regions,  for  transporta¬ 
tion  in  ice-covered  seas  and  for  the  design  of  offshore  structures  intended  to 
survive  in  the  presence  of  ice.  At  present  this  is  done  manually,  a  long  and 
tedious  process  which  precludes  full  use  of  the  great  volume  of  relevant 
images  now  available. 

We  describe  an  automatic  and  accurate  method  for  identifying  ice  floes 
and  their  outlines.  Floe  outlines  are  modeled  as  closed  principal  curves  (Has- 
tie  and  Stuetzle,  1989),  a  flexible  class  of  smooth  non-parametric  curves.  We 
propose  a  robust  method  of  estimating  closed  principal  curves  which  reduces 
both  bias  and  variance.  Initial  estimates  of  floe  outlines  come  from  the 
erosion-propagation  (EP)  algorithm,  which  combines  erosion  from  mathemati¬ 
cal  morphology  with  local  propagation  of  information  about  floe  edges. 

The  edge  pixels  from  the  EP  algorithm  are  grouped  into  floe  outlines 
using  a  new  clustering  algorithm.  This  extends  existing  clustering  methods  by 
allowing  groups  to  be  centered  about  principal  curves  rather  than  points  or 
lines.  This  may  open  the  way  to  efficient  feature  extraction  using  cluster 
analysis  in  images  more  generally.  The  method  is  implemented  in  an  object- 
oriented  programming  environment  for  which  it  is  well  suited,  and  is  quite 
computationally  efficient. 

KEYWORDS:  'Erosion;  Feature  extraction;  LANDS  AT;  Non-parametric 
curves)  Object-oriented  programming;  Robustness. 
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1.  INTRODUCTION 

Knowledge  of  the  shapes,  sizes  and  spatial  distribution  of  ice  floes  is  important  for 
understanding  the  physical  processes  operating  on  the  ice  pack  in  the  polar  regions.  It  is  also 
important  for  practical  problems  associated  with  transportation  in  ice-covered  seas  and  for  the 
design  of  offshore  structures  intended  to  survive  in  the  presence  of  ice. 

Such  information  can  be  found  in  satellite  images  of  the  polar  regions  such  as  Figure  1, 
which  exist  in  large  and  rapidly  increasing  numbers.  Practical  use  of  such  images  requires 
identification  of  the  outlines  of  ice  floes  above  a  certain  size.  To  date  this  has  been  done 
manually  (Rothrock  and  Thorndike,  1984),  a  slow  and  tedious  process  that  often  takes  a  day 
or  more  to  record  the  data  from  a  single  image  and  effectively  precludes  full  use  of  the  data. 
Automating  the  process  is  inherently  difficult.  Problems  include  the  presence  of  many  smaller 
floes  and  of  melt  ponds  on  the  surface  of  floes  which  ensure  that  floes  often  do  not  appear  as 
homogeneous  blocks  of  ice  in  the  image. 

In  this  article  we  describe  an  automatic  method  for  identifying  the  outlines  of  ice  floes. 
The  outcome  of  this  is  shown  in  Figure  2,  and  is  almost  the  same  as  the  result  of  very  careful 
manual  digitization.  We  model  ice  floe  outlines  as  closed  principal  curves  (Hastie  and  Stuet- 
zle,  1989  -  hereafter  HS),  a  flexible  family  of  one-dimensional  non-parametric  curves  in  a 
higher-dimensional  space.  Our  method  consists  of  identifying  a  set  of  edge  pixels  and  group¬ 
ing  them  into  clusters  which  are  centered  about  a  principal  curve.  Each  cluster  corresponds  to 
a  floe  and  the  corresponding  principal  curve  is  the  estimated  floe  outline. 

The  method  involves  several  new  statistical  techniques: 

(1)  A  way  of  estimating  closed  principal  curves  that  reduces  both  bias  and  variance  and  is 

robust  to  outliers.  Here,  outliers  take  the  form  of  melt  ponds  on  the  surface  of  ice  floes 

(Section  2). 

(2)  The  erosion-propagation  (EP)  algorithm  provides  initial  estimates  of  floe  outlines.  This 

combines  the  existing  idea  of  erosion  fror..  mathematical  morphology  (Matheron  1975; 

Serra  1982)  with  that  of  local  propagation  of  information  about  floe  boundaries  (Section 

3). 
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Figure  I.  A  polar  LANDS  AT  image  showing  ice  Hoes.  This  is  a  200x200  pixel  image, 


where  each  pixel  is  80m  square;  it  thus  represents  a  15x15  km  area. 
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Figure  2.  The  ice  floe  outlines,  larger  than  a  fixed  minimum  size,  found  by  our  procedure  for 
the  data  in  Figure  1. 
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(3)  A  method  for  clustering  about  principal  curves.  Existing  clustering  algorithms  separate 
data  into  groups,  each  of  which  is  clustered  about  some  central  point  (Gordon  1981, 
1987;  Murtagh  1985;  Committee  on  Applied  and  Theoretical  Statistics  1989).  Here  we 
generalize  this  to  allow  each  group  to  be  clustered  about  a  different  principal  curve.  This 
opens  the  possibility  that  cluster  analysis  may  be  useful  more  generally  for  fast  feature 
extraction  in  images  (Section  4). 

The  method  is  implemented  in  an  object-oriented  programming  environment  for  which  it  is 
well  suited,  and  seems  computationally  efficient. 

2.  ESTIMATING  CLOSED  PRINCIPAL  CURVES 

In  this  section,  we  first  review  the  definition  and  basic  properties  of  principal  curves 
(Section  2.1).  We  then  describe  a  new  algorithm  for  estimating  closed  principal  curves  that 
reduces  both  bias  and  variance  and  is  robust  (Section  2.2). 

2.1  Principal  curves 

A  principal  curve  is  a  smooth  one-dimensional  curve  that  passes  through  the  middle  of 
an  m  -dimensional  data  set,  such  as  that  shown  in  Figure  3,  for  which  m=  2.  It  is  non- 
parametric  and  its  shape  is  suggested  by  the  data;  it  thus  provides  a  non-linear  summary  of 
the  data.  The  idea  was  introduced  and  developed  by  Hastie  (1984),  Hastie  and  Stuetzle  (1985) 
and  HS. 

A  one-dimensional  curve  in  m  -space  is  an  m  -vector  consisting  of  m  functions  of  a  sin¬ 
gle  variable  X,  called  coordinate  functions.  The  variable  X  parameterizes  the  curves  and  pro¬ 
vides  an  ordering  along  it;  X  will  often  be  arc-length  along  the  curve.  Let  X  e  R”'  be  a  con¬ 
tinuous  random  vector.  Then  f(X)  is  a  principal  curve  of  X  if 

E[X  lr1(X)  =  Xj  =  X, 


rl(x)  =  max  {  X:  ||x-f(A,)||  =  inf  ||x-f(p)||  ). 
X  H 


where 


oo  o 
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Figure  3.  Data  for  illustrating  the  fitting  of  closed  principal  curves.  The  data  were  obtained 
by  generating  points  uniformly  on  the  circumference  of  a  circle,  and  perturbing  them  ran¬ 
domly  along  the  normal  to  the  circle  according  to  a  Gaussian  distribution. 
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Given  the  distribution  of  X,  HS  proposed  the  following  algorithm  for  finding  f: 

f,+,(A.)  =  £|x  lf"l(X)  =  \|,  (2.1) 

where  f,  is  the  ith  iterate. 

When  the  distribution  of  X  is  unknown,  this  extends  to  estimation  of  f  from  a  data  set 
(xj,  .  .  .  ,x„  }  by  estimating  £[x  lf;-1(X)  =  Xl.  HS  do  this  by  means  of  scatterplot  smoothing, 
using  neighborhoods  of  each  point  defined  by  their  projections  onto  the  current  estimate  of  the 
principal  curve,  rather  than  by  their  position  in  Rm . 

Let  f,  be  the  ith  iterate  and  let  Xj  =  fj_1(x;- )  (_/  =  1.  •  •  .  ,«).  Let  Xj  be  the  j th  order 

statistic  of  the  set  (X{,  .  .  .  ,  X^  ),  and  let  be  the  data  point  that  projects  onto  X/.  Then  let 
Nj  be  the  set  of  data  points  x^  such  that  X/  is  in  a  neighborhood  of  Xj;  the  size  of  the  neigh¬ 
borhood  is  controlled  by  the  span,  equal  to  the  fraction  of  all  the  data  points  which  are  in  the 
neighborhood.  The  next  iterate  is  then 

f,+1(X)  =  £[XlN').  (2.2) 

This  is  calculated  using  a  coordinate  wise  scatterplot  smoother,  and  various  possibilities  are 
discussed  by  HS. 

There  is  no  formal  proof  that  the  algorithm  converges,  but  HS  report  that  they  have  had 
no  convergence  problems  with  more  than  40  real  and  simulated  examples.  Figure  4  shows 
the  result  of  applying  the  HS  estimation  procedure  to  the  data  in  Figure  3. 

2.2  An  algorithm  for  estimating  closed  principal  curves  that  reduces  both  bias  and  vari¬ 
ance  and  is  robust 

Scatterplot  smoothers  generally  produce  curves  that  are  biased  towards  the  center  of  cur¬ 
vature.  It  is  clear  from  Figure  4  that  the  estimated  principal  curve  suffers  from  this  problem, 
and  that  the  bias  increases  with  the  span.  In  most  statistical  problems  there  is  a  tradeoff 
between  bias  on  one  hand,  and  smoothness  and  variance  on  the  other.  For  closed  curves,  the 
center  of  curvature  is  interior  to  the  curve  (except  for  small  local  regions  in  non-convex 
curves).  We  can  use  this  fact  to  modify  the  HS  principal  curve  estimation  algorithm  so  as  to 
reduce  bias  and  variance  at  once. 
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If  the  distribution  of  X  is  known,  let  5'(X,X)  =  fj+](A.)-f,  (X)  be  the  change  in  the  calcu¬ 
lated  curve  from  iteration  /  to  ’teration  i  +  1  in  the  algorithm  (2.1).  Then 

5'(X,A.)  =  £(X-f,(X)  I  f,_1(X)  =  Xj. 

This  suggests  that  when  the  distribution  is  unknown  and  is  estimated  from  the  data  using  an 
iterative  algorithm  such  as  (2.2),  the  projections  of  the  data  in  N‘,  rather  than  the  data  them¬ 
selves,  should  be  used  to  calculate  ?,+1(X).  Let  $,•  =  be  the  change  in  the  esti¬ 
mate  of  f  from  the  ith  to  the  (/  +  l)th  iteration,  let  p‘  =  be  the  projection  of  xA ( 

onto  f, ,  and  let  p;  be  the  coordinatewise  average  of  the  projections  of  the  data  in  Nj  onto  f, . 
Then  we  calculate  f(+1  by  setting  &.•  =  py'. 

Figure  5  compares  this  algorithm  with  that  of  HS  for  various  spans.  Our  algorithm  does 
not  suffer  from  the  bias  problem  inherent  in  that  of  HS.  Figure  6  shows  that  our  algorithm 
produces  a  much  smoother  curve  when  the  span  is  small  enough  for  that  of  HS  to  yield  a 
relatively  unbiased  curve. 

Outliers  arise  in  the  form  of  shallow  but  sometimes  large  melt  ponds  on  the  surface  of 
the  ice  floe.  An  example  of  this  is  Figure  7,  which  shows  the  edge  pixels  of  one  floe  in  Figure 
1  identified  by  the  EP  algorithm.  The  points  near  the  left  side  interior  to  the  floe  are  from 
melt  ponds.  Since  they  do  not  belong  to  the  edge  of  the  floe,  we  need  to  ensure  that  they  not 
affect  the  estimate  of  the  principal  curve. 

To  eliminate  the  effect  of  outliers  we  use  a  slight  modification  of  an  approach  suggested 
by  HS.  In  the  scatterplot  smoothers  we  use  a  weighted  average,  where  the  weight  of  a  point 
depends  on  its  distance  from  the  current  estimate  of  the  principal  curve.  We  calculate  the 
standard  deviation  of  the  lengths  of  the  projections  and  set  the  weight  for  a  point  to  zero  if  it 
is  more  than  three  standard  deviations  from  the  current  estimate  of  the  principal  curve,  and  to 
unity  otherwise.  Figure  8  shows  the  result  of  this  robust  procedure  for  the  data  in  Figure  7. 
as  well  as  that  of  the  non-robust  procedure  which  uses  the  mean  of  all  the  data  in  each  neigh 
borhood.  The  robust  procedure  has  clearly  achieved  its  goal. 


Figure  5.  Comparison  of  the  HS  algorithm  for  estimating  closed  principal  curves  with  that 
proposed  here.  This  shows  the  principal  curves  resulting  from  the  HS  algoritlim  (thin  line)  and 
the  algorithm  proposed  here  with  spans  of  (A)  0.2,  (B)  0.3  and  (C)  0.5. 


Figure  6.  The  principal  curve  from  the  HS  algorithm  (thin  line)  at  a  span  small  enough  to 
eliminate  most  of  the  bias,  compared  with  the  estimate  proposed  here  (thick  line)  with  a  span 
of  0.2.  The  smaller  the  span,  the  less  the  bias  and  the  rougher  the  estimated  curve.  Notice 
how  rough  the  HS  curve  is,  compared  with  the  present  estimate. 
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Figure  7.  Ice  floe  edge.,  wi.h  melt  ponds.  The  small  circles  are  ihe  edge  pixels  for  one  of  ,he 

floes  in  Figure  1.  as  identified  by  the  EP  algorithm.  The  points  intenor  to  the  floe  me  from 
melt  ponds. 
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Figure  8.  Principal  curve  estimated  using  the  robust  procedure  described  in  the  text  (thick 
line),  compared  with  the  estimate  from  the  non-robust  procedure  (thin  line).  The  robust  esti¬ 
mate  is  unaffected  by  the  melt  ponds,  while  the  non-robust  estimate  is  pulled  towards  them. 
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3.  THE  EROSION-PROPAGATION  (EP)  ALGORITHM 

To  select  the  potential  edge  pixels  and  provide  an  initial  grouping  of  them  into  floe  out¬ 
lines,  we  use  the  EP  algorithm.  This  operates  on  binary  images.  However,  images  of  ice  floes 
such  as  Figure  1  are  usually  greyscale.  The  marginal  distribution  of  pixel  intensities  is  highly 
bimodal,  and  so  we  work  with  the  simpler  binary  image  obtained  by  thresholding  the  original 
image;  see  Figure  9.  The  final  result  is  relatively  insensitive  to  the  precise  choice  of  threshold. 

The  erosion  part  of  the  EP  algorithm,  which  identifies  the  potential  edge  elements,  is  a 
standard  application  of  ideas  in  mathematical  morphology  (Serra,  1982).  The  propagation  part 
of  the  EP  algorithm  keeps  track  of  the  floe  to  which  an  edge  pixel  belongs  by  locally  pro¬ 
pagating  the  information  about  edge  elements  into  the  interior  of  the  floe  as  it  is  eroded.  This 
is  facilitated  by  the  object-oriented  programming  environment. 

The  algorithm  is  iterative  and  operates  on  a  binary  image  consisting  of  figures  (ice  floes) 
on  a  contrasting  background  (water).  At  the  first  iteration,  if  a  pixel  is  ice  and  a  specified  sub¬ 
set  of  its  neighbors  is  water,  the  pixel  is  "melted"  and  becomes  water,  so  that  the  figure  to 
which  it  belongs  is  eroded.  In  our  implementation,  a  pixel  is  "melted"  if  any  of  the  eight 
neighboring  pixels  is  water.  At  the  second  iteration,  the  same  operation  is  performed  on  the 
image  resulting  from  the  first  iteration,  and  so  on.  This  can  be  formally  described  in  terms  of 
structuring  elements  usmg  the  terminology  of  mathematical  morphology  (Banfield  1988).  Any 
edge  locating  operator  can  be  used  to  provide  the  initial  set  of  potential  edge  pixels.  We  use 
the  pixels  "melted"  at  the  first  iteration  of  the  EP  algorithm. 

Some  results  are  shown  in  Figure  10.  We  can  control  the  minimum  size  of  the  floes  by 
waiting  until  a  specified  number  of  iterations,  /min,  have  passed  before  recording  a  floe.  The 
smallest  floe  which  can  be  recorded  is  then  a  square  of  side  (2/min+l)  pixels.  Smaller  floes 
"melt"  and  are  not  recorded. 

The  idea  of  the  propagation  part  of  the  EP  algorithm  is  that  the  locations  of  the  edge 
pixels  are  propagated  towards  the  interior  of  the  figure  as  it  is  eroded.  At  the  end  of  the  pro¬ 
cess,  a  single  interior  point  of  the  figure  will  "know"  the  locations  of  all  the  edge  pixels  to 
which  it  corresponds.  The  location  information  is  passed  to  only  a  few  pixels  which  are  taken 
as  far  from  the  eroded  pixel  as  possible  subject  to  them  not  belonging  to  a  different  floe.  This 
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Figure  9.  Binaiy  version  of  Figure  1  for  two  threshold  levels.  The  results  are  similar.  How¬ 
ever,  (A)  has  a  lower  threshold  level  than  (B)  and  therefore  has  more  clutter  in  the  water  but 


less  noise  interior  to  the  floes. 


L'r 
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Figure  10.  The  results  of  applying  the  EP  algorithm  to  Figure  9(A)  after  (A)  3  iterations,  (B) 
7  iterations  and  (C)  1 2  iterations. 
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ensures  that  the  amount  of  location  information  to  be  processed  does  not  become  unmanage¬ 
able.  It  also  prevents  loss  of  information  due  to  irregularity  of  the  floe,  melt  ponds,  or  pixel 
misclassification  at  the  thresholding  stage. 

The  key  to  the  propagation  part  of  the  EP  algorithm  is  that  it  is  never  necessary  for  any 
pixel  to  know  the  direction  of  the  interior  of  the  floe.  If  a  pixel  is  eroded  at  the  the  /'  th  itera¬ 
tion,  then  it  was  the  center  of  a  (2 i  +  1)  by  (2i  +1)  square  of  ice  pixels  before  the  start  of  the 
erosion  process.  Thus  its  location  may  be  passed  anywhere  within  the  square  of  side  (2 i  +1) 
pixels  surrounding  it  without  any  risk  of  the  information  being  passed  to  another  floe.  We 
have  found  that  it  is  enough  to  pass  the  location  information  in  two  opposite  directions  within 
the  square.  All  of  the  eroded  pixels  are  processed  in  exactly  the  same  manner  and  it  is  this 
uniform  processing  that  allows  the  algorithm  to  be  implemented  on  parallel  processing 
machines. 

In  Figure  11  we  show  the  results  of  the  EP  algorithm  applied  to  the  data  in  Figure  1,  as 
thresholded  in  Figure  9(B)  with  a  minimum  floe  size  of  15x15  pixels  (i.e.  1.2  km.  square). 
The  results  are  reasonably  good:  of  the  35  floes  identified  by  the  EP  algorithm,  23  are  "right" 
in  the  sense  of  being  close  to  floes  identified  by  careful  manual  digitization. 

However,  the  EP  algorithm  tends  to  subdivide  floes.  This  can  occur  when  the  floes  are 
non-convex  or  when  they  have  noise  in  the  interior.  Figure  12  shows  an  example  of  the  non- 
convex  case.  As  the  floe  shown  in  Figure  12(A)  was  eroded,  the  narrow  middle  section  was 
pinched  in  and  the  floe  was  divided  into  two  partial  floes,  shown  in  Figures  12(B,  C).  Figure 
13(A)  is  an  example  of  a  floe  with  melt  ponds  that  cause  the  EP  algorithm  to  produce  three 
partial  floes,  as  seen  in  Figures  13(B,  C,  D). 

4.  CLUSTERING  ABOUT  CLOSED  PRINCIPAL  CURVES 

The  EP  algorithm  tends  to  subdivide  floes.  We  have  therefore  developed  a  method  for 
determining  which  of  the  floes  identified  by  the  EP  algorithm  should  be  merged,  based  on  an 
algorithm  for  clustering  about  closed  principal  curves.  Since  we  want  to  find  out  whether  to 
merge  tentatively  identified  floes,  this  is  hierarchical  and  agglomerative. 
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Figure  11.  Result  of  the  EP  algorithm  applied  to  Figure  9(A),  where  floes  are  not  recorded 
unless  they  have  survived  at  least  7  iterations.  This  corresponds  to  a  minimum  floe  size  of 
15x  15  pixels,  or  1.2  km  square.  The  points  are  the  edge  elements  identified  by  the  EP  algo¬ 
rithm  and  the  numbers  interior  to  each  floe  are  the  centers  found  by  the  EP  algorithm.  Note 
that  centers  1  and  4  are  on  the  same  floe,  which  was  subdivided  because  of  the  melt  ponds. 
Other  floes  were  also  subdivided. 
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The  objective  of  cluster  analysis  is  to  group  a  set  of  observations  into  "interesting"  sub¬ 
sets.  In  practice,  this  has  usually  meant  grouping  observations  which  are  close  to  one  another 
Ward  (1963)  proposed  a  hierarchical  agglomerative  algorithm  for  dividing  data  into  g  groups 
such  that  the  sum  of  the  within-group  sums  of  squares  is  minimized.  The  algorithm  starts  by 
assigning  each  observation  to  a  separate  group.  At  each  agglomeration  two  groups  are 
merged,  chosen  so  as  to  minimize  the  increase  in  the  sum  of  within-group  sums  of  squares. 
This  clustering  criterion  is  optimal  if  the  data  are  generated  by  a  finite  mixture  of  spherical 
Normal  distributions.  This  corresponds  to  clusters  which  tend  to  be  of  the  same  size  and 

spherical 

Murtagh  and  Raftery  (1984)  proposed  decomposing  the  within-group  sum  of  squares  into 
parts  and  using  a  weighted  sum  of  the  parts  as  the  clustering  criterion,  with  weights  chosen  so 
as  to  emphasize  aspects  of  interest  in  the  application.  For  two-dimensional  data,  they  sug¬ 
gested  decomposing  the  within-group  sum  of  squares  into  parts  parallel  and  perpendicular  to 
the  first  principal  component  of  the  group  and  downweighting  the  parallel  part.  This  criterion 
was  generalized  by  Banfield  and  Raftery  (1989)  who  also  showed  that  it  is  optimal  when  the 
data  are  generated  by  a  mixture  of  Normal  distributions  with  covariance  matrices  whose 
eigenvalues  are  constant  across  clusters.  This  corresponds  to  clusters  which  tend  to  be  ellipti¬ 
cal  with  the  same  size  and  shape  but  different  orientations. 

We  now  apply  the  idea  of  decomposing  and  reweighting  the  within-group  sum  of  squares 
to  the  present  problem.  The  edge  pixels  for  an  ice  floe  that  has  not  been  subdivided  should  be 

(a)  tightly  clustered  about  the  floe  outline,  as  estimated  by  the  principal  curve,  and 

(b)  regularly  spaced  along  the  outline,  so  that  the  variance  of  the  distances  between  neigh¬ 
boring  edge  pixels  should  be  small;  see  Figure  14  and  Figure  15. 

We  decompose  the  variance,  V,  for  a  group  of  edge  pixels  into  parts  corresponding  to 
(a)  and  (b)  and  a  residual  part,  as  follows.  Let  the  locations  of  the  edge  pixels  be 

\j  0  =  1 . n),  ordered  so  that  f(kj )  =  x;  and  \j<  ■  •  -  where  ?  is  the  principal 

curve  of  the  group,  estimated  by  the  method  of  Section  2.2.  Let  E;  =f(X; )-  f(A.y+1)  be  the  vec¬ 
tor  defined  by  two  adjacent  projections  onto  the  estimated  principal  curve.  Since  we  are  work¬ 
ing  with  closed  curves,  all  subscripting  is  modulo  n  . 
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Figure  14.  A  complete  floe,  showing  the  edge  pixels  identified  by  the  EP  algorithm  (open  cir¬ 
cles)  and  their  projections  onto  the  estimated  principal  curve  (dark  circles).  The  projections 
onto  the  principal  curve  are  regularly  spaced,  and  so  the  variance  of  the  distance  between 
adjacent  projections  is  small. 
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Figure  15.  A  partial  floe.  There  are  large  gaps  between  the  projections  of  the  edge  pixels 
onto  the  estimated  principal  curve,  and  so  the  variance  of  the  distance  between  adjacent  pro¬ 
jections  is  large. 
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We  may  now  write  the  within  group  variance  as 
V  =  t  !i*,-x||; 

i-i 

~  ^ about  ^ along  ^ residual  • 

where  Vahout  =  £  ||x;  -f(X;)||2,  and  Valong  =’/2^  \\z]  -Z  ||2.  Vaboul  is  a  measure  of  lack  of 
;=1  j= i 

tightness  of  the  distribution  of  the  edge  pixels  about  the  floe  outline,  and  thus  of  the  extent  to 
which  requirement  (a)  is  not  satisfied.  Valong  is  a  measure  of  the  lack  of  regularity  of  the  dis¬ 
tribution  of  the  pixels  along  the  floe  outline,  and  thus  of  the  extent  to  which  requirement  (b) 
is  not  satisfied  It  is  shown  by  Banfield  (1988)  that 

VreM  =  HE,' II2  +  £<V*  ■>  I- 

)= 1  >=> 

where  r;  =  f( )  -  -  ]T  f(A.; )  and  •  denotes  the  dot  product.  It  follows  that  Vresidual  is  a  meas- 
"  *=l 

ure  of  the  overall  size  of  the  floe.  The  general  reweighted  within-group  variance  is  thus 

®  ^  about  P  ^ along  Y  ^ residual ' 

V residual  contains  no  information  of  interest  to  us  here,  so  we  set  y=0.  Also,  without  loss  of 
generality  we  set  P  =  1 .  Our  clustering  criterion  is  therefore 

V about  +  V along 

To  determine  whether  a  set  of  groups  should  be  merged,  we  first  calculate  V *  for  each  of 
the  individual  groups  and  then  we  calculate  V‘  for  the  union  of  the  groups.  If  the  union  has  a 
smaller  value  of  V*  than  any  of  the  individual  groups  we  merge  them,  and  otherwise  not.  To 
make  the  number  of  candidate  mergers  manageable,  we  note  that  if  a  floe  has  been  subdivided 
the  partial  floes  will  have  common  edge  pixels,  but  not  conversely.  Therefore,  only  mergers  of 
groups  with  common  edge  pixels  are  considered. 
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To  determine  a,  we  note  that  by  arguments  similar  to  those  of  Banfield  and  Rafter}' 
(1989)  V *  will  be  an  optimal  criterion,  conditional  on  the  estimated  principal  curves,  if  'he 
edge  pixels  are  normally  distributed  about  the  floe  outlines  and  if  E[Val  ]  =  at'[Vai)OI(( ]. 
We  therefore  estimate  a  as  the  average  of  V along  /  V about  f°r  the  floes  that  we  know  were  not 
subdivided,  namely  those  which  have  no  shared  edge  elements.  Using  a  span  of  0.3  to  esti¬ 
mate  the  principal  curves,  this  yielded  d  =  0.39. 

Table  1  shows  the  results  of  merging  the  partial  floes  in  Figure  11,  and  Table  2  shows 
the  results  of  trying  to  merge  floes  that  should  not  be  merged.  In  each  case  our  method  gives 
a  result  which  is  both  correct  and  clearcut.  Figure  16  shows  the  final  results  of  the  procedure, 
together  with  the  identified  edge  pixels. 

5.  DISCUSSION 

We  have  developed  an  automatic  method  for  finding  tire  outlines  of  ice  floes  in  satellite 
images  It  is  accurate  and  computationally  efficient.  It  involves  three  new  statistical  tech¬ 
niques:  a  way  of  estimating  closed  principal  curves  that  reduces  both  bias  and  variance  and  is 
robust,  the  EP  algorithm,  and  a  method  for  clustering  about  principal  curves. 

The  approach  would  seem  to  be  applicable  more  generally  to  the  detection  of  non-linear 
features  in  images  It  extends  cluster  analysis  to  the  case  where  similar  pixels  tend  to  be 
grouped  about  arbitrary  curved  features,  open  or  closed,  using  the  idea  of  decomposing  and 
reweighting  the  within-group  sum  of  squares  proposed  by  Murtagh  and  Raftery  (1984).  This 
suggests  that  cluster  analysis  may  be  useful  for  feature  extraction  in  images  more  generally. 

The  procedure  is  implemented  in  an  object-oriented  programming  environment.  One  of 
the  advantages  of  this  environment  is  that  each  floe  resulting  from  the  procedure  can  be 
represented  as  an  instance  of  a  "floe  object"  and  can  carry  with  it,  in  the  form  of  instance 
variables,  specific  information  about  the  floe  to  be  used  in  further  analysis.  It  is  relatively 
fast:  a  512x512  8-bit  image  can  be  analyzed  in  approximately  one  hour  on  a  Symbolics  3600 
and  should  be  considerably  faster  on  some  of  the  newer  workstations  (e  g.  about  20  minutes 
on  a  Sun-3/80  and  under  10  minutes  on  a  Sun  SPARCstation  1).  The  processing  time  is 
linear  ir  the  number  of  pixels,  but  does  depend  upon  the  complexity  of  the  image. 


Figure  16.  Principal  curves  of  the  floes  found  after  using  the  clustering  method  to  merge  the 
partial  floes  found  by  the  EP  algorithm  (solid  curves).  The  circles  are  the  edge  elements  found 
by  the  EP  algorithm. 
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In  our  implementation  of  the  EP  algorithm  we  erode  an  ice  pixel  if  any  of  its  eight 
neighbors  are  water.  Other  rules  for  eroding  a  floe  may  be  used  and  they  can  change  the  rate 
of  erosion,  the  effect  of  pixel  misclassification  and  the  shape  of  the  floes  that  can  be  found. 
The  rules  may  also  be  changed  as  the  erosion  procedes.  For  example,  dilating  the  image,  or 
"refreezing"  the  ice,  at  an  early  iteration  could  eliminate  some  pixel  misclassifications  and 
other  noise  in  the  interior  of  the  floes.  The  EP  algorithm  has  the  potential  of  being  imple¬ 
mented  on  parallel  processing  machines. 

To  date,  the  development  of  automated  techniques  for  the  analysis  of  polar  satellite 
images  has  been  limited  to  ice  floe  tracking  (Ninnis,  Emery  and  Collins  1986;  Fily  and 
Rothrock  1986,  1987;  Vesecky,  Samadani,  Smith,  Daida  and  Bracewell  1988).  The  primary 
tool  in  these  automated  tracking  methods  is  cross-correlation,  which  provides  the  ability  to 
match  regions  in  two  different  images,  but  does  not  give  any  information  about  the  morphol¬ 
ogy  of  the  individual  ice  floes  or  the  spatial  structure  of  the  ice  pack.  Vesecky  et  al.  (1988) 
use  segments  of  ice  floe  boundaries  to  track  ice  floe  movements,  but  this  does  not  provide  the 
type  of  information  needed  to  study  ice  floe  morphology  and  spatial  structure.  The  need  for 
more  inlormation  on  both  morphology  and  spatial  structure  was  clearly  shown  by  the  1984 
Marginal  Ice  Zone  Experiment  (Bums  et  al.  1987;  Campbell  et  al  1987). 

The  problem  considered  here  does  not  fall  neatly  into  one  of  the  problem  areas  in  image 
understanding  that  have  been  intensely  studied  in  recent  years,  namely  image  restoration, 
classification,  segmentation  and  feature  extraction.  It  does,  however,  combine  elements  from 
all  of  them.  Image  restoration  attempts  to  reconstruct  a  degraded  image.  Image  classification 
tries  to  assign  each  pixel  to  one  of  several  predetermined  categories;  it  may  be  regarded  as  a 
special  case  of  restoration.  Image  segmentation  (Rosenfeld  and  Kak,  1982;  Chellappa  and 
Sawchuk,  1985)  seeks  to  identify  areas  of  contiguous  pixels  that  are,  for  example,  devoted  to 
the  same  crop.  The  aim  of  feature  extraction  is  to  find  linear  or  curvilinear  features  in  images. 

Our  problem  shares  goals  with  feature  extraction,  segmentation  and  classification.  While 
restoration  is  not  an  explicit  goal,  we  would  expect  the  methods  developed  here  to  work  well 
in  the  presence  of  degradation.  Restoration  and  classification  methods  do  not,  by  themselves, 
address  the  present  problem.  Current  feature  detectors  would  seem  to  have  difficulty  locating 
features  as  arbitrary  as  ice  floes.  For  example,  the  Hough  transform  (Hough  1962;  Ballard 
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Table  1.  Floe  merger  results.  This  table  shows  the  values  of  the  merger  criterion  V*  for  the 
subdivided  floes  from  Figure  II.  The  value  of  the  criterion  for  the  individual  partial  floes  is 
shown  under  the  " Individual "  column.  The  value  under  the  " Merged "  column  is  the  criterion 
value  when  the  floes  in  the  left  column  are  merged.  The  floe  numbers  refer  to  the  center 
numbers  in  Figure  1 1 .  The  fact  that  the  value  under  the  " Merged "  column  is  smaller  than 
ony  of  the  values  under  the  " Individual'  column  indicates  that  the  floes  should  be  merged. 


Hoes 

Criterion  Values 

Individual  Merged 

Hoe  1 

11.36 

Hoe  4 

5.71  0.64 

Hoe  5 

4.07 

Hoe  8 

10.86  2.36 

Hoe  12 

8.29 

Hoe  2 

1.32 

Hoe  3 

1.36  .86 

Hoe  10 

6.95 

Hoe  6 

4.37 

Floe  1 1 

3.71  .60 

Hoe  25 

1.33 

Hoe  34 


1.90 


.33 
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Table  2.  Floe  non-merger  results.  This  table  shows  the  values  of  the  merger  criterion  V*  for 
non-subdivided  floes  from  Figure  11.  The  value  of  the  criterion  for  the  individual  partial 
foes  is  shown  under  the  "Individual"  column  The  value  under  the  ''Merged'  column  is  the 
criterion  value  when  the  floes  in  the  left  column  are  merged.  The  floe  numbers  refer  to  the 
center  numbers  in  Figure  11.  The  fact  that  the  value  under  the  "Merged'  column  is  larger 
than  anv  of  the  values  under  the  " Individual "  column  indicates  that  the  floes  should  not  be 
merged. 


Floes 

Criterion  Values 

Individual  Merged 

Floe  27 

.24 

Floe  13 

.39  10.70 

Floe  34 

1.90 

Floe  25 

1.33  7.41 

Floe  7 

.20 

Floe  30 

.59 

Floe  31 

.93  2.27 

1981;  Davis  1982),  an  obvious  candidate  for  locating  closed  curves,  requires  an  initial  pattern 
description  which  it  then  tries  to  find  in  the  image.  It  would  be  difficult  to  provide  an  initial 
pattern  description  that  is  general  enough  to  accommodate  the  wide  range  of  commonly  found 
ice  floe  shapes.  Our  approach  may  also  be  applicable  to  segmentation  problems,  especially 
those  concerned  with  identifying  not  only  regions  but  also  the  shapes  of  their  outlines. 

The  Bayesian  and  stochastic  relaxation  approach  of  Geman  and  Geman  (1984)  may  well 
be  applicable  to  the  present  problem,  although  it  has  to  date  been  used  mainly  for  restoration. 
It  would  require  extensive  modeling  assumptions  for  the  ice  floe  problem,  and  experience 


-  30  - 


suggests  that  it  would  be  computationally  expensive.  The  computational  burden  might  be 
reduced  by  using  as  an  approximation  the  ICM  algorithm  of  Besag  (1986),  although  this  does 
not  yet  appear  to  have  been  applied  to  problems  such  as  the  present  one.  Our  procedure  on 
the  other  hand  requires  only  the  assumption  that  the  ice  floe  boundaries  be  closed  curves,  and 
it  is  relatively  fast. 
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Identification  of  ice  floes  and  their  outlines  in  satellite  images  is 
important  for  understanding  physical  processes  in  the  polar  regions,  for 
transportation  in  ice-covered  seas  and  for  the  design  of  offshore  struc¬ 
tures  intended  to  survive  in  the  presence  of  ice.  At  present  this  is  done 
manually,  a  long  and  tedious  process  which  precludes  full  use  of  the  great 
volume  of  relevant  images  now  available. 
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We  describe  an  automatic  and  accurate  method  for  identifying  ice 
floes  and  their  outlines.  Floe  outlines  are  modeled  as  closed  prin¬ 
cipal  curves  (Hastie  and  Stuetzle,  1989),  a  flexible  class  of  smooth 
non-parametric  curves.  We  propose  a  robust  method  of  estimating 
closed  principal  curves  which  reduces  both  bias  and  variance. 

Initial  estimates  of  floe  outlines  come  from  mathematical  morphology 
with  local  propagation  of  information  about  floe  edges. 

The  edge  pixels  from  the  EP  algorithm  are  grouped  into  floe  out¬ 
lines  using  a  new  clustering  algorithm.  This  extends  existing  clus¬ 
tering  methods  by  allowing  groups  to  be  centered  about  principal 
curves  rather  than  points  or  lines.  This  may  open  the  way  to  effi¬ 
cient  feature  extraction  using  cluster  analysis  in  images  more  gen¬ 
erally.  The  method  is  implemented  in  an  object-oriented  programming 
environment  for  which  it  is  well  suited,  and  is  quite  computationally 
ef  f icient . 


